Human learning in non-Markovian decision making
نویسندگان
چکیده
Humans can learn under a wide variety of feedback conditions. Particularly important types of learning fall under the category of reinforcement learning (RL) where a series of decisions must be made and a sparse feedback signal is obtained. Computational and behavioral studies of RL have focused mainly on Markovian decision processes (MDPs), where the next state and reward depends only on the current state and action. Little is known about non-Markovian decision making in humans. Here we consider tasks in which the state transition function is still Markovian, but the reward function is non-Markovian. For example, learning to go from A to B is non-Markovian when receiving a reward at B is contingent on having visited a switch-state C before arriving at B. Learning is also non-Markovian when feedback is delayed and there is no unique mapping between feedback and state-action pairs. Classical RL algorithms can be categorized into value based methods, such as temporal difference (TD) learning, and policy gradient methods. The former cannot cope with such non-Markovian conditions, whereas policy gradient methods do, but are infamous for being slow. Here, we show that humans can learn both, with non-Markovian switch states and delayed feedback. Human learning with switch-states is nearly Bayes-optimal, whereas learning with delayed feedback is Bayes-suboptimal. Stikingly, both tasks are well modeled with a spiking neural network using a cascade of eligibility traces to implement a policy gradient procedure.
منابع مشابه
Human and Machine Learning in Non-Markovian Decision Making
Humans can learn under a wide variety of feedback conditions. Reinforcement learning (RL), where a series of rewarded decisions must be made, is a particularly important type of learning. Computational and behavioral studies of RL have focused mainly on Markovian decision processes, where the next state depends on only the current state and action. Little is known about non-Markovian decision m...
متن کاملNon-Deterministic Policies in Markovian Decision Processes
Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision-making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct decision support systems for action selection in Markovian environments. Although conventional meth...
متن کاملNon-Deterministic Policies In Markovian Processes
Markovian processes have long been used to model stochastic environments. Reinforcement learning has emerged as a framework to solve sequential planning and decision making problems in such environments. In recent years, attempts were made to apply methods from reinforcement learning to construct adaptive treatment strategies, where a sequence of individualized treatments is learned from clinic...
متن کاملMemory Approaches To Reinforcement Learning In Non-Markovian Domains
Reinforcement learning is a type of unsupervised learning for sequential decision making. Q-learning is probably the best-understood reinforcement learning algorithm. In Q-learning, the agent learns a mapping from states and actions to their utilities. An important assumption of Q-learning is the Markovian environment assumption, meaning that any information needed to determine the optimal acti...
متن کاملHq-learning: Discovering Markovian Subgoals for Non-markovian Reinforcement Learning
To solve partially observable Markov decision problems, we introduce HQ-learning, a hierarchical extension of Q-learning. HQ-learning is based on an ordered sequence of subagents, each learning to identify and solve a Markovian subtask of the total task. Each agent learns (1) an appropriate subgoal (though there is no intermediate, external reinforcement for \good" subgoals), and (2) a Markovia...
متن کامل